Search CORE

Edinburgh Research Explorer

Caltech Authors

Literature curation of protein interactions: measuring agreement across major public databases

Author: A. L. Turinsky
B. Turner
Bader
Bader
Bader
Bairoch
Ceol
Charbonnier
Chien
Collins
Cusick
Feki
Gavin
Gavin
Guldener
Hermjakob
Ho
Howe
I. M. Donaldson
Jensen
Kerrien
Kleiman
Krogan
Kuhner
Lehner
Leitner
Lievens
Mons
Orchard
Peri
Prieto
Razick
Rual
S. J. Wodak
S. Razick
Salwinski
Salwinski
Stark
Tong
Uetz
von Mering
Publication venue: Oxford University Press
Publication date
Field of study

Literature curation of protein interaction data faces a number of challenges. Although curators increasingly adhere to standard data representations, the data that various databases actually record from the same published information may differ significantly. Some of the reasons underlying these differences are well known, but their global impact on the interactions collectively curated by major public databases has not been evaluated. Here we quantify the agreement between curated interactions from 15 471 publications shared across nine major public databases. Results show that on average, two databases fully agree on 42% of the interactions and 62% of the proteins curated from the same publication. Furthermore, a sizable fraction of the measured differences can be attributed to divergent assignments of organism or splice isoforms, different organism focus and alternative representations of multi-protein complexes. Our findings highlight the impact of divergent curation policies across databases, and should be relevant to both curators and data consumers interested in analyzing protein-interaction data generated by the scientific community

OrthoNets: simultaneous visual analysis of orthologs and their interaction neighborhoods across different organisms

Author: A. L. Turinsky
A. Merkoulovitch
B. Turner
Bateman
Costanzo
D. Roudeva
Gavin
J. Greenblatt
J. Vlasblom
Kouzarides
Krogan
O'Brien
Razick
S. J. Wodak
S. Pu
Shannon
Stelzl
Y. Hao
Yu
Zhu
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Protein interaction networks contain a wealth of biological information, but their large size often hinders cross-organism comparisons. We present OrthoNets, a Cytoscape plugin that displays protein–protein interaction (PPI) networks from two organisms simultaneously, highlighting orthology relationships and aggregating several types of biomedical annotations. OrthoNets also allows PPI networks derived from experiments to be overlaid on networks extracted from public databases, supporting the identification and verification of new interactors. Any newly identified PPIs can be validated by checking whether their orthologs interact in another organism

iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database

Author: A Ceol
A Clauset
A Ruepp
A Stojmirovic
AL Turinsky
Antonio Mora
B Aranda
B Turner
C Alfarano
C Stark
G Csardi
GD Bader
I Xenarios
Ian M Donaldson
J Yu
KR Brown
P Braun
P Pagel
RM Ewing
S Kerrien
S Razick
TS Keshava Prasad
U Guldener
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The iRefIndex addresses the need to consolidate protein interaction data into a single uniform data resource. iRefR provides the user with access to this data source from an R environment. Results The iRefR package includes tools for selecting specific subsets of interest from the iRefIndex by criteria such as organism, source database, experimental method, protein accessions and publication identifier. Data may be converted between three representations (MITAB, edgeList and graph) for use with other R packages such as igraph, graph and RBGL. The user may choose between different methods for resolving redundancies in interaction data and how n-ary data is represented. In addition, we describe a function to identify binary interaction records that possibly represent protein complexes. We show that the user choice of data selection, redundancy resolution and n-ary data representation all have an impact on graphical analysis. Conclusions The package allows the user to control how these issues are dealt with and communicate them via an R-script written using the iRefR package - this will facilitate communication of methods, reproducibility of network analyses and further modification and comparison of methods by researchers.</p

NORA - Norwegian Open Research Archives

Reactome: a database of reactions, pathways and biological processes

Author: B. Jassal
B. May
C. Yung
Chen
D. Croft
Demir
Dutta
E. Birney
E. Schmidt
Frazer
Funahashi
G. Gopinath
G. O'Kelly
G. Wu
H. Hermjakob
I. Kalatskaya
Irwin
Jain
Kerrien
Killcoyne
L. Matthews
L. Stein
M. Caudy
M. Gillespie
Montecchi-Palazzi
N. Ndegwa
Novere
P. D'Eustachio
P. Garapati
Pico
R. Haw
Razick
S. Jupe
S. Mahajan
Sherry
V. Shamovsky
Warr
Wiegers
Wu
Wu
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice

CiteSeerX

City University of New York

Cold Spring Harbor Laboratory Institutional Repository

A new, fast algorithm for detecting protein coevolution using maximum compatible cliques

Author: A Rodionov
A Valencia
AK Ramani
Alex Rodionov
Alexandr Bezginov
AM Altenhoff
D MacLeod
D Robinson
Elisabeth RM Tillier
ERM Tillier
ERM Tillier
F Pazos
F Pazos
GW Clark
J Felsenstein
J Felsenstein
Jonathan Rose
K Katoh
MK Kuhner
PRJ Östergård
R Jothi
RG Beiko
RM Karp
S Razick
T Sato
V Soria-Carrasco
W Li
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time. Results In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM. Conclusions MMMvII will thus allow for more more extensive and intricate analyses of coevolution. Availability An implementation of the MMMvII algorithm is available at: <url>http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php</url></p

Candidate gene prioritization by network analysis of differential expression using machine learning approaches

Author: A Subramanian
A Zanzoni
AJ Smola
AP Francisco
B Aranda
B Harr
Bart de Moor
C Saunders
C Stark
C von Mering
D Nitsch
D Zieker
Daniela Nitsch
F Chung
F Fouss
Fabian Ojeda
GC Cawley
GD Bader
H Yang
HY Chuang
J Chen
JA Hanley
Joana P Gonçalves
JW Park
K Lage
KR Brown
L Franke
L Gautier
L Salwinski
LC Tranchevent
M Liu
P Baldi
P Pagel
R Gupta
RA Irizarry
RI Kondor
RK Nibbe
S Aerts
S Köhler
S Mirkin
S Razick
S Vardhanabhuti
SE Choe
T Fawcett
WK Lim
Y Saad
Yves Moreau
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. Results We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (<it>Simple Expression Ranking</it>). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the <it>Heat Kernel Diffusion Ranking </it>leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. Conclusion In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.</p

Bio::Homology::InterologWalk - A Perl module to build putative protein-protein interaction networks through interolog mapping

Author: A Ceol
A Valencia
A Wiles
AJ Vilella
AJ Walhout
Andrew P Jarman
B Aranda
B Lehner
BJ Breitkreutz
C Prieto
CS Pedamallu
CT Hittinger
D Bray
D Figeys
D Kemmer
DJ LaCount
E Chautard
F He
G Gallone
Giuseppe Gallone
H Hegyi
H Yu
H Yu
HB Fraser
J Douglas Armstrong
J Goll
J Wojcik
JE Stajich
KR Brown
L Giot
L Matthews
LJ Jensen
LR Matthews
M Ashburner
M Michaut
M Persico
MD Adams
NJ Krogan
P Bork
P Flicek
P Kersey
P Shannon
PJ Kersey
R Sharan
RM Ewing
RT Fielding
S Kerrien
S Li
S Razick
S Wuchty
S Wuchty
T Berggård
T Ian Simpson
TKB Gandhi
TW Huang
TW Huang
U Stelzl
X He
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Protein-protein interaction (PPI) data are widely used to generate network models that aim to describe the relationships between proteins in biological systems. The fidelity and completeness of such networks is primarily limited by the paucity of protein interaction information and by the restriction of most of these data to just a few widely studied experimental organisms. In order to extend the utility of existing PPIs, computational methods can be used that exploit functional conservation between orthologous proteins across taxa to predict putative PPIs or 'interologs'. To date most interolog prediction efforts have been restricted to specific biological domains with fixed underlying data sources and there are no software tools available that provide a generalised framework for 'on-the-fly' interolog prediction. Results We introduce <monospace>Bio::Homology::InterologWalk</monospace>, a Perl module to retrieve, prioritise and visualise putative protein-protein interactions through an orthology-walk method. The module uses orthology and experimental interaction data to generate putative PPIs and optionally collates meta-data into an Interaction Prioritisation Index that can be used to help prioritise interologs for further analysis. We show the application of our interolog prediction method to the genomic interactome of the fruit fly, <it>Drosophila melanogaster</it>. We analyse the resulting interaction networks and show that the method proposes new interactome members and interactions that are candidates for future experimental investigation. Conclusions Our interolog prediction tool employs the Ensembl Perl API and PSICQUIC enabled protein interaction data sources to generate up to date interologs 'on-the-fly'. This represents a significant advance on previous methods for interolog prediction as it allows the use of the latest orthology and protein interaction data for all of the genomes in Ensembl. The module outputs simple text files, making it easy to customise the results by post-processing, allowing the putative PPI datasets to be easily integrated into existing analysis workflows. The <monospace>Bio::Homology::InterologWalk</monospace> module, sample scripts and full documentation are freely available from the Comprehensive Perl Archive Network (CPAN) under the GNU Public license.</p

Edinburgh Research Explorer

SoftPanel: a website for grouping diseases and related disorders for generation of customized panels

Author: A Bravo
A Hamosh
A Liberzon
A Rath
A Subramanian
Cong Zhang
D Croft
D Smedley
D Szklarczyk
GU Ganegoda
J Gillis
Johnathan Watkins
K Lage
KI Goh
LC Tranchevent
Likun Wang
M Ashburner
M Girdea
M Kanehisa
M Oti
MA Driel
Michael McNutt
MJ Bamshad
MJ Cowley
MN Nikiforova
N Gill
RC Deo
S Razick
T Sing
X Wu
X Yao
Y Chen
Y Moreau
Yan Jin
Yuxin Yin
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

University of Toronto Research Repository

An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology

Author: A del Pozo
A Hofer
A Patil
A Schlicker
AC Gavin
AJ Faller
C Pesquita
C Pesquita
C Pesquita
C Pesquita
CM Deane
D Li
D Lin
D Warde-Farley
DP Eisinger
DR Rhodes
F Azuaje
FM Couto
G Alterovitz
G van Rossum
G Yu
Gary D Bader
H Wu
H Yu
H Yu
I Xenarios
J Cheng
JJ Jiang
JL Sevilla
JZ Wang
K Strasser
K Xia
LJ Jensen
M Ashburner
M West
N Nariai
NJ Krogan
P Resnik
P Uetz
P Zhang
R Gentleman
R Shen
S Chavez
S Razick
Shobhit Jain
T Xu
U Stelzl
X Guo
Y Chen
Y Tao
Z Lei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity. Results We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS), to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs. Conclusions The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the <it>F</it>1 score over Resnik, the next best method, on our <it>Saccharomyces cerevisiae </it>PPI dataset and 2 times on our <it>Homo sapiens </it>PPI dataset using cellular component, biological process and molecular function GO annotations.</p